Goto

Collaborating Authors

 panoramic image



DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion

Neural Information Processing Systems

Diffusion-based methods have achieved remarkable achievements in 2D image or 3D object generation, however, the generation of 3D scenes and even $360^{\circ}$ images remains constrained, due to the limited number of scene datasets, the complexity of 3D scenes themselves, and the difficulty of generating consistent multi-view images. To address these issues, we first establish a large-scale panoramic video-text dataset containing millions of consecutive panoramic keyframes with corresponding panoramic depths, camera poses, and text descriptions. Then, we propose a novel text-driven panoramic generation framework, termed DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation. Specifically, benefiting from the powerful generative capabilities of stable diffusion, we fine-tune a single-view text-to-panorama diffusion model with LoRA on the established panoramic video-text dataset. We further design a spherical epipolar-aware multi-view diffusion model to ensure the multi-view consistency of the generated panoramic images. Extensive experiments demonstrate that DiffPano can generate scalable, consistent, and diverse panoramic images with given unseen text descriptions and camera poses.



GeometricExploitationforIndoorPanoramic SemanticSegmentation

Neural Information Processing Systems

PAnoramic Semantic Segmentation (PASS) isanimportant task incomputer vision, as it enables semantic understanding of a 360 environment. Currently, most of existing works have focused on addressing the distortion issues in 2D panoramic images without considering spatial properties of indoor scene. This restricts PASS methods inperceiving contextual attributestodealwith theambiguity when working with monocular images. In this paper, we propose anovel approach for indoor panoramic semantic segmentation. Unlike previous works, we consider the panoramic image as a composition of segment groups:oversampled segments,representing planar structures suchasfloorsandceilings, and under-sampled segments, representing other scene elements.


12 ethereal images from the 2025 Northern Lights Photographer of the Year awards

Popular Science

Photographer Victor Lima used a 12 millimeter fisheye lens to take this panoramic image of the aurora borealis over the Skรณgafoss waterfall in Iceland. Breakthroughs, discoveries, and DIY tips sent every weekday. Watching the aurora borealis (in the Northern Hemisphere) or aurora australis (in the Southern Hemisphere) is unforgettable. Photographing them is on a whole other level. Capturing these ribbons of light as they move and twist across the night sky transforms even the darkest winter night into a surreal wonderland.



Panoramic Out-of-Distribution Segmentation

arXiv.org Artificial Intelligence

Panoramic imaging enables capturing 360ยฐ images with an ultra-wide Field-of-View (FoV) for dense omnidirectional perception, which is critical to applications, such as autonomous driving and augmented reality, etc. However, current panoramic semantic segmentation methods fail to identify outliers, and pinhole Out-of-distribution Segmentation (OoS) models perform unsatisfactorily in the panoramic domain due to pixel distortions and background clutter. To address these issues, we introduce a new task, Panoramic Out-of-distribution Segmentation (PanOoS), with the aim of achieving comprehensive and safe scene understanding. Furthermore, we propose the first solution, POS, which adapts to the characteristics of panoramic images through text-guided prompt distribution learning. Specifically, POS integrates a disentanglement strategy designed to materialize the cross-domain generalization capability of CLIP. The proposed Prompt-based Restoration Attention (PRA) optimizes semantic decoding by prompt guidance and self-adaptive correction, while Bilevel Prompt Distribution Learning (BPDL) refines the manifold of per-pixel mask embeddings via semantic prototype supervision. Besides, to compensate for the scarcity of PanOoS datasets, we establish two benchmarks: DenseOoS, which features diverse outliers in complex environments, and QuadOoS, captured by a quadruped robot with a panoramic annular lens system. Extensive experiments demonstrate superior performance of POS, with AuPRC improving by 34.25% and FPR95 decreasing by 21.42% on DenseOoS, outperforming state-of-the-art pinhole-OoS methods. Moreover, POS achieves leading closed-set segmentation capabilities and advances the development of panoramic understanding. Code and datasets will be available at https://github.com/MengfeiD/PanOoS.